{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## A Gentle Introduction to use One-Class SVM for Anomaly Detection\n",
    "#### Author Nagdev Amruthnath\n",
    "Date: 1/9/2019\n",
    "\n",
    "#### Citation Info\n",
    "If you are using this for your research, please use the following for citation.\n",
    "\n",
    "Amruthnath, Nagdev, and Tarun Gupta. \"A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance.\" In 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pp. 355-361. IEEE, 2018.\n",
    "\n",
    "#### Disclaimer\n",
    "This is a tutorial for performing fault detection using machine learning. You this code at your own risk. I do not gurantee that this would work as shown below. If you have any suggestions please branch this project.\n",
    "\n",
    "### Load Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "options(warn=-1)\n",
    "\n",
    "# load libraries\n",
    "library(dplyr)\n",
    "library(e1071)\n",
    "library(caret)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Load data\n",
    "Here we are using data from a bench press. There are total of four different states in this machine and they are split into four different csv files. We need to load the data first. In the data time represents the time between samples, ax is the acceleration on x axis, ay is the acceleration on y axis, az is the acceleration on z axis and at is the G's. The data was collected at sample rate of 100hz.\n",
    "\n",
    "Four different states of the machine were collected\n",
    "\n",
    "    1. Nothing attached to drill press\n",
    "    2. Wooden base attached to drill press\n",
    "    3. Imbalance created by adding weight to one end of wooden base\n",
    "    4. Imbalacne created by adding weight to two ends of wooden base."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<caption>A data.frame: 6 × 5</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>time</th><th scope=col>ax</th><th scope=col>ay</th><th scope=col>az</th><th scope=col>aT</th></tr>\n",
       "\t<tr><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>0.002</td><td>-0.3246</td><td> 0.2748</td><td> 0.1502</td><td>0.451</td></tr>\n",
       "\t<tr><td>0.009</td><td> 0.6020</td><td>-0.1900</td><td>-0.3227</td><td>0.709</td></tr>\n",
       "\t<tr><td>0.019</td><td> 0.9787</td><td> 0.3258</td><td> 0.0124</td><td>1.032</td></tr>\n",
       "\t<tr><td>0.027</td><td> 0.6141</td><td>-0.4179</td><td> 0.0471</td><td>0.744</td></tr>\n",
       "\t<tr><td>0.038</td><td>-0.3218</td><td>-0.6389</td><td>-0.4259</td><td>0.833</td></tr>\n",
       "\t<tr><td>0.047</td><td>-0.3607</td><td> 0.1332</td><td>-0.1291</td><td>0.406</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A data.frame: 6 × 5\n",
       "\\begin{tabular}{r|lllll}\n",
       " time & ax & ay & az & aT\\\\\n",
       " <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\\\\n",
       "\\hline\n",
       "\t 0.002 & -0.3246 &  0.2748 &  0.1502 & 0.451\\\\\n",
       "\t 0.009 &  0.6020 & -0.1900 & -0.3227 & 0.709\\\\\n",
       "\t 0.019 &  0.9787 &  0.3258 &  0.0124 & 1.032\\\\\n",
       "\t 0.027 &  0.6141 & -0.4179 &  0.0471 & 0.744\\\\\n",
       "\t 0.038 & -0.3218 & -0.6389 & -0.4259 & 0.833\\\\\n",
       "\t 0.047 & -0.3607 &  0.1332 & -0.1291 & 0.406\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A data.frame: 6 × 5\n",
       "\n",
       "| time &lt;dbl&gt; | ax &lt;dbl&gt; | ay &lt;dbl&gt; | az &lt;dbl&gt; | aT &lt;dbl&gt; |\n",
       "|---|---|---|---|---|\n",
       "| 0.002 | -0.3246 |  0.2748 |  0.1502 | 0.451 |\n",
       "| 0.009 |  0.6020 | -0.1900 | -0.3227 | 0.709 |\n",
       "| 0.019 |  0.9787 |  0.3258 |  0.0124 | 1.032 |\n",
       "| 0.027 |  0.6141 | -0.4179 |  0.0471 | 0.744 |\n",
       "| 0.038 | -0.3218 | -0.6389 | -0.4259 | 0.833 |\n",
       "| 0.047 | -0.3607 |  0.1332 | -0.1291 | 0.406 |\n",
       "\n"
      ],
      "text/plain": [
       "  time  ax      ay      az      aT   \n",
       "1 0.002 -0.3246  0.2748  0.1502 0.451\n",
       "2 0.009  0.6020 -0.1900 -0.3227 0.709\n",
       "3 0.019  0.9787  0.3258  0.0124 1.032\n",
       "4 0.027  0.6141 -0.4179  0.0471 0.744\n",
       "5 0.038 -0.3218 -0.6389 -0.4259 0.833\n",
       "6 0.047 -0.3607  0.1332 -0.1291 0.406"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "setwd(\"/home/\")\n",
    "#read csv files\n",
    "file1 = read.csv(\"dry run.csv\", sep=\",\", header =T)\n",
    "file2 = read.csv(\"base.csv\", sep=\",\", header =T)\n",
    "file3 = read.csv(\"imbalance 1.csv\", sep=\",\", header =T)\n",
    "file4 = read.csv(\"imbalance 2.csv\", sep=\",\", header =T)\n",
    "head(file1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can look at the summary of each file using summary function in R. Below, we can observe that 66 seconds long data is available. We also have min, max and mean for each of the variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "      time               ax                  ay                  az          \n",
       " Min.   :  0.004   Min.   :-1.402700   Min.   :-1.693300   Min.   :-3.18930  \n",
       " 1st Qu.: 27.005   1st Qu.:-0.311100   1st Qu.:-0.429600   1st Qu.:-0.57337  \n",
       " Median : 54.142   Median : 0.015100   Median :-0.010700   Median :-0.11835  \n",
       " Mean   : 54.086   Mean   : 0.005385   Mean   :-0.002534   Mean   :-0.09105  \n",
       " 3rd Qu.: 81.146   3rd Qu.: 0.314800   3rd Qu.: 0.419475   3rd Qu.: 0.34815  \n",
       " Max.   :108.127   Max.   : 1.771900   Max.   : 1.515600   Max.   : 5.04610  \n",
       "       aT        \n",
       " Min.   :0.0360  \n",
       " 1st Qu.:0.6270  \n",
       " Median :0.8670  \n",
       " Mean   :0.9261  \n",
       " 3rd Qu.:1.1550  \n",
       " Max.   :5.2950  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# summary of each file\n",
    "summary(file2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Aggregration and feature extraction\n",
    "Here, the data is aggregated by 1 minute and features are extracted. Features are extracted to reduce the dimension of the data and only storing the representation of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<caption>A tibble: 6 × 21</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>group</th><th scope=col>ax_mean</th><th scope=col>ax_sd</th><th scope=col>ax_min</th><th scope=col>ax_max</th><th scope=col>ax_median</th><th scope=col>ay_mean</th><th scope=col>ay_sd</th><th scope=col>ay_min</th><th scope=col>ay_may</th><th scope=col>⋯</th><th scope=col>az_mean</th><th scope=col>az_sd</th><th scope=col>az_min</th><th scope=col>az_maz</th><th scope=col>az_median</th><th scope=col>aT_mean</th><th scope=col>aT_sd</th><th scope=col>aT_min</th><th scope=col>aT_maT</th><th scope=col>aT_median</th></tr>\n",
       "\t<tr><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>⋯</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>0</td><td>-0.038164706</td><td>0.6594686</td><td>-1.2587</td><td>1.3821</td><td>-0.0955</td><td>-0.0682627451</td><td>0.7506785</td><td>-1.3892</td><td>1.6418</td><td>⋯</td><td>-0.13803333</td><td>0.9845115</td><td>-2.6753</td><td>2.7507</td><td> 0.0254</td><td>1.273216</td><td>0.5830149</td><td>0.400</td><td>3.029</td><td>1.0770</td></tr>\n",
       "\t<tr><td>1</td><td>-0.005806122</td><td>0.6325808</td><td>-1.6194</td><td>1.1943</td><td>-0.0015</td><td> 0.0037908163</td><td>0.7819044</td><td>-1.5625</td><td>1.5428</td><td>⋯</td><td>-0.20496837</td><td>0.9252188</td><td>-3.0774</td><td>2.7158</td><td>-0.2121</td><td>1.263622</td><td>0.5448447</td><td>0.410</td><td>3.197</td><td>1.1375</td></tr>\n",
       "\t<tr><td>2</td><td> 0.069845455</td><td>0.6665500</td><td>-1.4554</td><td>1.4667</td><td> 0.1070</td><td> 0.0744333333</td><td>0.8022922</td><td>-1.4800</td><td>1.7951</td><td>⋯</td><td>-0.06405354</td><td>0.9293866</td><td>-1.8205</td><td>2.4862</td><td>-0.1512</td><td>1.298364</td><td>0.5131552</td><td>0.255</td><td>2.644</td><td>1.2830</td></tr>\n",
       "\t<tr><td>3</td><td> 0.011552525</td><td>0.5511310</td><td>-1.9254</td><td>1.2034</td><td> 0.0675</td><td> 0.0008262626</td><td>0.7894209</td><td>-2.0042</td><td>1.5577</td><td>⋯</td><td>-0.09287879</td><td>0.8893505</td><td>-2.1562</td><td>3.2355</td><td>-0.1672</td><td>1.203848</td><td>0.5125826</td><td>0.393</td><td>3.322</td><td>1.1180</td></tr>\n",
       "\t<tr><td>4</td><td> 0.046688119</td><td>0.6426574</td><td>-1.7805</td><td>1.4837</td><td> 0.0836</td><td>-0.0177594059</td><td>0.7510811</td><td>-1.6629</td><td>1.4369</td><td>⋯</td><td>-0.13990000</td><td>0.9265720</td><td>-1.8515</td><td>3.5451</td><td>-0.1741</td><td>1.226267</td><td>0.5824608</td><td>0.313</td><td>3.597</td><td>1.1720</td></tr>\n",
       "\t<tr><td>5</td><td> 0.006678788</td><td>0.5780957</td><td>-1.4719</td><td>1.4355</td><td> 0.0536</td><td> 0.0013626263</td><td>0.7812245</td><td>-1.6293</td><td>1.6362</td><td>⋯</td><td>-0.16540000</td><td>0.9091516</td><td>-2.5561</td><td>2.9196</td><td>-0.2588</td><td>1.209515</td><td>0.5664847</td><td>0.336</td><td>3.035</td><td>1.1590</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A tibble: 6 × 21\n",
       "\\begin{tabular}{r|lllllllllllllllllllll}\n",
       " group & ax\\_mean & ax\\_sd & ax\\_min & ax\\_max & ax\\_median & ay\\_mean & ay\\_sd & ay\\_min & ay\\_may & ay\\_median & az\\_mean & az\\_sd & az\\_min & az\\_maz & az\\_median & aT\\_mean & aT\\_sd & aT\\_min & aT\\_maT & aT\\_median\\\\\n",
       " <fct> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\\\\n",
       "\\hline\n",
       "\t 0 & -0.038164706 & 0.6594686 & -1.2587 & 1.3821 & -0.0955 & -0.0682627451 & 0.7506785 & -1.3892 & 1.6418 & -0.19000 & -0.13803333 & 0.9845115 & -2.6753 & 2.7507 &  0.0254 & 1.273216 & 0.5830149 & 0.400 & 3.029 & 1.0770\\\\\n",
       "\t 1 & -0.005806122 & 0.6325808 & -1.6194 & 1.1943 & -0.0015 &  0.0037908163 & 0.7819044 & -1.5625 & 1.5428 &  0.01005 & -0.20496837 & 0.9252188 & -3.0774 & 2.7158 & -0.2121 & 1.263622 & 0.5448447 & 0.410 & 3.197 & 1.1375\\\\\n",
       "\t 2 &  0.069845455 & 0.6665500 & -1.4554 & 1.4667 &  0.1070 &  0.0744333333 & 0.8022922 & -1.4800 & 1.7951 &  0.11860 & -0.06405354 & 0.9293866 & -1.8205 & 2.4862 & -0.1512 & 1.298364 & 0.5131552 & 0.255 & 2.644 & 1.2830\\\\\n",
       "\t 3 &  0.011552525 & 0.5511310 & -1.9254 & 1.2034 &  0.0675 &  0.0008262626 & 0.7894209 & -2.0042 & 1.5577 & -0.00270 & -0.09287879 & 0.8893505 & -2.1562 & 3.2355 & -0.1672 & 1.203848 & 0.5125826 & 0.393 & 3.322 & 1.1180\\\\\n",
       "\t 4 &  0.046688119 & 0.6426574 & -1.7805 & 1.4837 &  0.0836 & -0.0177594059 & 0.7510811 & -1.6629 & 1.4369 & -0.02530 & -0.13990000 & 0.9265720 & -1.8515 & 3.5451 & -0.1741 & 1.226267 & 0.5824608 & 0.313 & 3.597 & 1.1720\\\\\n",
       "\t 5 &  0.006678788 & 0.5780957 & -1.4719 & 1.4355 &  0.0536 &  0.0013626263 & 0.7812245 & -1.6293 & 1.6362 & -0.09910 & -0.16540000 & 0.9091516 & -2.5561 & 2.9196 & -0.2588 & 1.209515 & 0.5664847 & 0.336 & 3.035 & 1.1590\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A tibble: 6 × 21\n",
       "\n",
       "| group &lt;fct&gt; | ax_mean &lt;dbl&gt; | ax_sd &lt;dbl&gt; | ax_min &lt;dbl&gt; | ax_max &lt;dbl&gt; | ax_median &lt;dbl&gt; | ay_mean &lt;dbl&gt; | ay_sd &lt;dbl&gt; | ay_min &lt;dbl&gt; | ay_may &lt;dbl&gt; | ⋯ ⋯ | az_mean &lt;dbl&gt; | az_sd &lt;dbl&gt; | az_min &lt;dbl&gt; | az_maz &lt;dbl&gt; | az_median &lt;dbl&gt; | aT_mean &lt;dbl&gt; | aT_sd &lt;dbl&gt; | aT_min &lt;dbl&gt; | aT_maT &lt;dbl&gt; | aT_median &lt;dbl&gt; |\n",
       "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
       "| 0 | -0.038164706 | 0.6594686 | -1.2587 | 1.3821 | -0.0955 | -0.0682627451 | 0.7506785 | -1.3892 | 1.6418 | ⋯ | -0.13803333 | 0.9845115 | -2.6753 | 2.7507 |  0.0254 | 1.273216 | 0.5830149 | 0.400 | 3.029 | 1.0770 |\n",
       "| 1 | -0.005806122 | 0.6325808 | -1.6194 | 1.1943 | -0.0015 |  0.0037908163 | 0.7819044 | -1.5625 | 1.5428 | ⋯ | -0.20496837 | 0.9252188 | -3.0774 | 2.7158 | -0.2121 | 1.263622 | 0.5448447 | 0.410 | 3.197 | 1.1375 |\n",
       "| 2 |  0.069845455 | 0.6665500 | -1.4554 | 1.4667 |  0.1070 |  0.0744333333 | 0.8022922 | -1.4800 | 1.7951 | ⋯ | -0.06405354 | 0.9293866 | -1.8205 | 2.4862 | -0.1512 | 1.298364 | 0.5131552 | 0.255 | 2.644 | 1.2830 |\n",
       "| 3 |  0.011552525 | 0.5511310 | -1.9254 | 1.2034 |  0.0675 |  0.0008262626 | 0.7894209 | -2.0042 | 1.5577 | ⋯ | -0.09287879 | 0.8893505 | -2.1562 | 3.2355 | -0.1672 | 1.203848 | 0.5125826 | 0.393 | 3.322 | 1.1180 |\n",
       "| 4 |  0.046688119 | 0.6426574 | -1.7805 | 1.4837 |  0.0836 | -0.0177594059 | 0.7510811 | -1.6629 | 1.4369 | ⋯ | -0.13990000 | 0.9265720 | -1.8515 | 3.5451 | -0.1741 | 1.226267 | 0.5824608 | 0.313 | 3.597 | 1.1720 |\n",
       "| 5 |  0.006678788 | 0.5780957 | -1.4719 | 1.4355 |  0.0536 |  0.0013626263 | 0.7812245 | -1.6293 | 1.6362 | ⋯ | -0.16540000 | 0.9091516 | -2.5561 | 2.9196 | -0.2588 | 1.209515 | 0.5664847 | 0.336 | 3.035 | 1.1590 |\n",
       "\n"
      ],
      "text/plain": [
       "  group ax_mean      ax_sd     ax_min  ax_max ax_median ay_mean       ay_sd    \n",
       "1 0     -0.038164706 0.6594686 -1.2587 1.3821 -0.0955   -0.0682627451 0.7506785\n",
       "2 1     -0.005806122 0.6325808 -1.6194 1.1943 -0.0015    0.0037908163 0.7819044\n",
       "3 2      0.069845455 0.6665500 -1.4554 1.4667  0.1070    0.0744333333 0.8022922\n",
       "4 3      0.011552525 0.5511310 -1.9254 1.2034  0.0675    0.0008262626 0.7894209\n",
       "5 4      0.046688119 0.6426574 -1.7805 1.4837  0.0836   -0.0177594059 0.7510811\n",
       "6 5      0.006678788 0.5780957 -1.4719 1.4355  0.0536    0.0013626263 0.7812245\n",
       "  ay_min  ay_may ⋯ az_mean     az_sd     az_min  az_maz az_median aT_mean \n",
       "1 -1.3892 1.6418 ⋯ -0.13803333 0.9845115 -2.6753 2.7507  0.0254   1.273216\n",
       "2 -1.5625 1.5428 ⋯ -0.20496837 0.9252188 -3.0774 2.7158 -0.2121   1.263622\n",
       "3 -1.4800 1.7951 ⋯ -0.06405354 0.9293866 -1.8205 2.4862 -0.1512   1.298364\n",
       "4 -2.0042 1.5577 ⋯ -0.09287879 0.8893505 -2.1562 3.2355 -0.1672   1.203848\n",
       "5 -1.6629 1.4369 ⋯ -0.13990000 0.9265720 -1.8515 3.5451 -0.1741   1.226267\n",
       "6 -1.6293 1.6362 ⋯ -0.16540000 0.9091516 -2.5561 2.9196 -0.2588   1.209515\n",
       "  aT_sd     aT_min aT_maT aT_median\n",
       "1 0.5830149 0.400  3.029  1.0770   \n",
       "2 0.5448447 0.410  3.197  1.1375   \n",
       "3 0.5131552 0.255  2.644  1.2830   \n",
       "4 0.5125826 0.393  3.322  1.1180   \n",
       "5 0.5824608 0.313  3.597  1.1720   \n",
       "6 0.5664847 0.336  3.035  1.1590   "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "file1$group = as.factor(round(file1$time))\n",
    "file2$group = as.factor(round(file2$time))\n",
    "file3$group = as.factor(round(file3$time))\n",
    "file4$group = as.factor(round(file4$time))\n",
    "#(file1,20)\n",
    "\n",
    "#list of all files\n",
    "files = list(file1, file2, file3, file4)\n",
    "\n",
    "#loop through all files and combine\n",
    "features = NULL\n",
    "for (i in 1:4){\n",
    "res = files[[i]] %>%\n",
    "    group_by(group) %>%\n",
    "    summarize(ax_mean = mean(ax),\n",
    "              ax_sd = sd(ax),\n",
    "              ax_min = min(ax),\n",
    "              ax_max = max(ax),\n",
    "              ax_median = median(ax),\n",
    "              ay_mean = mean(ay),\n",
    "              ay_sd = sd(ay),\n",
    "              ay_min = min(ay),\n",
    "              ay_may = max(ay),\n",
    "              ay_median = median(ay),\n",
    "              az_mean = mean(az),\n",
    "              az_sd = sd(az),\n",
    "              az_min = min(az),\n",
    "              az_maz = max(az),\n",
    "              az_median = median(az),\n",
    "              aT_mean = mean(aT),\n",
    "              aT_sd = sd(aT),\n",
    "              aT_min = min(aT),\n",
    "              aT_maT = max(aT),\n",
    "              aT_median = median(aT)\n",
    "             )\n",
    "    features = rbind(features, res)\n",
    "}\n",
    "\n",
    "#view all features\n",
    "head(features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Train and Test Set\n",
    "To build an anomaly detection model, a train and test set is required. Here, the normal condition of the data is used for training and remaining is used for testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create train and test set\n",
    "train = features[1:67,2:ncol(features)]\n",
    "test = features[68:nrow(features),2:ncol(features)]\n",
    "\n",
    "trainLabels = rep(TRUE, 67)\n",
    "testLabels = rep(FALSE, nrow(features)-67)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### One-Class SVM\n",
    "Support vector machines (SVMs) are supervised learning models that analyze data and recognize patterns, and that can be used for both classification and regression tasks.\n",
    "\n",
    "Typically, the SVM algorithm is given a set of training examples labeled as belonging to one of two classes. An SVM model is based on dividing the training sample points into separate categories by as wide a gap as possible, while penalizing training samples that fall on the wrong side of the gap. The SVM model then makes predictions by assigning points to one side of the gap or the other.\n",
    "\n",
    "Sometimes oversampling is used to replicate the existing samples so that you can create a two-class model, but it is impossible to predict all the new patterns of fraud or system faults from limited examples. Moreover, collection of even limited examples can be expensive.\n",
    "\n",
    "Therefore, in one-class SVM, the support vector model is trained on data that has only one class, which is the “normal” class. It infers the properties of normal cases and from these properties can predict which examples are unlike the normal examples. This is useful for anomaly detection because the scarcity of training examples is what defines anomalies: that is, typically there are very few examples of the network intrusion, fraud, or other anomalous behavior.\n",
    "\n",
    "#### One-Class SVM using 'e1071'\n",
    "Now we know what one-class svm is, we can build a model for anomaly detection. svm function from e1071 can be used to build the model. This model like anyother model does require model tuning. Here we are tuning the model at different parameters of gamma and nu values. Also, if you noticed we are scaling the data as well. To train the model, we will be using radial function. The summary of the trained model si as shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Call:\n",
       "svm.default(x = train, y = NULL, scale = T, type = \"one-classification\", \n",
       "    kernel = \"radial\", gamma = seq(0.01, 0.9, by = 0.01), nu = seq(0.01, \n",
       "        0.9, by = 0.01), na.action = na.omit)\n",
       "\n",
       "\n",
       "Parameters:\n",
       "   SVM-Type:  one-classification \n",
       " SVM-Kernel:  radial \n",
       "      gamma:  0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 \n",
       "         nu:  0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 \n",
       "\n",
       "Number of Support Vectors:  11\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# train the model\n",
    "model = svm(train, y = NULL, \n",
    "            scale = T, \n",
    "            type = 'one-classification', \n",
    "            na.action = na.omit, \n",
    "            kernel = 'radial',\n",
    "            gamma = seq(0.01, 0.9, by = 0.01),\n",
    "            nu = seq(0.01, 0.9, by=0.01)\n",
    "           )\n",
    "\n",
    "# view the trained model\n",
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, the model is trained, we can do the predictions with all the data we have to see how the model performed and then we can create a confusion matrix. The results are as shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Confusion Matrix and Statistics\n",
       "\n",
       "          Reference\n",
       "Prediction FALSE TRUE\n",
       "     FALSE   295    0\n",
       "     TRUE      8   59\n",
       "                                          \n",
       "               Accuracy : 0.9779          \n",
       "                 95% CI : (0.9569, 0.9904)\n",
       "    No Information Rate : 0.837           \n",
       "    P-Value [Acc > NIR] : < 2e-16         \n",
       "                                          \n",
       "                  Kappa : 0.9232          \n",
       "                                          \n",
       " Mcnemar's Test P-Value : 0.01333         \n",
       "                                          \n",
       "            Sensitivity : 0.9736          \n",
       "            Specificity : 1.0000          \n",
       "         Pos Pred Value : 1.0000          \n",
       "         Neg Pred Value : 0.8806          \n",
       "             Prevalence : 0.8370          \n",
       "         Detection Rate : 0.8149          \n",
       "   Detection Prevalence : 0.8149          \n",
       "      Balanced Accuracy : 0.9868          \n",
       "                                          \n",
       "       'Positive' Class : FALSE           \n",
       "                                          "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# predict the data\n",
    "predictions = predict(model, features[,2:ncol(features)])\n",
    "\n",
    "# put data to a data frame\n",
    "results = data.frame(prediction = as.factor(predictions),actual = as.factor(c(trainLabels,testLabels)))\n",
    "\n",
    "# plot a confusion matrix\n",
    "confusionMatrix(results$actual, results$prediction)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Session info\n",
    "Below is the session info for the the packages and their versions used in this analysis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R version 3.3.3 (2017-03-06)\n",
       "Platform: x86_64-pc-linux-gnu (64-bit)\n",
       "Running under: Debian GNU/Linux 9 (stretch)\n",
       "\n",
       "locale:\n",
       " [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       \n",
       " [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   \n",
       " [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          \n",
       "[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   \n",
       "\n",
       "attached base packages:\n",
       "[1] stats     graphics  grDevices utils     datasets  methods   base     \n",
       "\n",
       "other attached packages:\n",
       "[1] caret_6.0-84    ggplot2_3.2.1   lattice_0.20-34 e1071_1.7-2    \n",
       "[5] dplyr_0.8.3    \n",
       "\n",
       "loaded via a namespace (and not attached):\n",
       " [1] pbdZMQ_0.3-3        tidyselect_0.2.5    repr_1.0.1         \n",
       " [4] purrr_0.3.2         reshape2_1.4.3      splines_3.3.3      \n",
       " [7] vctrs_0.2.0         colorspace_1.4-1    generics_0.0.2     \n",
       "[10] stats4_3.3.3        htmltools_0.3.6     base64enc_0.1-3    \n",
       "[13] survival_2.40-1     prodlim_2018.04.18  rlang_0.4.0        \n",
       "[16] ModelMetrics_1.2.2  pillar_1.4.2        glue_1.3.1         \n",
       "[19] withr_2.1.2         uuid_0.1-2          foreach_1.4.7      \n",
       "[22] plyr_1.8.4          lava_1.6.6          stringr_1.4.0      \n",
       "[25] timeDate_3043.102   munsell_0.5.0       gtable_0.3.0       \n",
       "[28] recipes_0.1.7       codetools_0.2-15    evaluate_0.14      \n",
       "[31] class_7.3-14        IRdisplay_0.7.0     Rcpp_1.0.2         \n",
       "[34] backports_1.1.4     scales_1.0.0        IRkernel_1.0.2.9000\n",
       "[37] ipred_0.9-9         jsonlite_1.6        digest_0.6.20      \n",
       "[40] stringi_1.4.3       grid_3.3.3          tools_3.3.3        \n",
       "[43] magrittr_1.5        lazyeval_0.2.2      tibble_2.1.3       \n",
       "[46] zeallot_0.1.0       crayon_1.3.4        pkgconfig_2.0.2    \n",
       "[49] MASS_7.3-45         Matrix_1.2-7.1      data.table_1.12.6  \n",
       "[52] lubridate_1.7.4     gower_0.2.1         assertthat_0.2.1   \n",
       "[55] iterators_1.0.12    R6_2.4.0            rpart_4.1-10       \n",
       "[58] nnet_7.3-12         nlme_3.1-129       "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sessionInfo()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
